tissue_of_interest = "Liver"
library(here)
here() starts at /home/rstudio/tabula-muris
source(here("00_data_ingest", "02_tissue_analysis_rmd", "boilerplate.R"))
Loading required package: ggplot2
Loading required package: cowplot
Attaching package: 'cowplot'
The following object is masked from 'package:ggplot2':
ggsave
Loading required package: Matrix
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ tibble 1.3.4 ✔ purrr 0.2.4
✔ tidyr 0.7.2 ✔ stringr 1.2.0
✔ readr 1.1.1 ✔ forcats 0.2.0
── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::expand() masks Matrix::expand()
✖ dplyr::filter() masks stats::filter()
✖ cowplot::ggsave() masks ggplot2::ggsave()
✖ dplyr::lag() masks stats::lag()
tiss = load_tissue_droplet(tissue_of_interest)
[1] "Scaling data matrix"
|
| | 0%
|
|=================================================================| 100%
Visualize top genes in principal components
Later on (in FindClusters and TSNE) you will pick a number of principal components to use. This has the effect of keeping the major directions of variation in the data and, ideally, supressing noise. There is no correct answer to the number to use, but a decent rule of thumb is to go until the plot plateaus.
PCElbowPlot(object = tiss)
Choose the number of principal components to use.
# Set number of principal components.
n.pcs = 10
The clustering is performed based on a nearest neighbors graph. Cells that have similar expression will be joined together. The Louvain algorithm looks for groups of cells with high modularity–more connections within the group than between groups. The resolution parameter determines the scale. Higher resolution will give more clusters, lower resolution will give fewer.
For the top-level clustering, aim to under-cluster instead of over-cluster. It will be easy to subset groups and further analyze them below.
# Set resolution
res.used <- 3.5
tiss <- FindClusters(object = tiss, reduction.type = "pca", dims.use = 1:n.pcs,
resolution = res.used, print.output = 0, save.SNN = TRUE)
We use TSNE solely to visualize the data.
# If cells are too spread out, you can raise the perplexity. If you have few cells, try a lower perplexity (but never less than 10).
tiss <- RunTSNE(object = tiss, dims.use = 1:n.pcs, seed.use = 10, perplexity=30)
TSNEPlot(object = tiss, do.label = T, pt.size = 1.2, label.size = 4)
tiss = compare_previous_annotation(tiss, tissue_of_interest, "droplet")
Warning: Missing column names filled in: 'X1' [1]
Parsed with column specification:
cols(
X1 = col_character(),
channel = col_character(),
cell_ontology_class = col_character(),
cell_ontology_id = col_character(),
free_annotation = col_character(),
cluster.ids = col_integer()
)
8 9
93 28
duct epithelial cell endothelial cell of hepatic sinusoid
26 27
hepatocyte
1671
0 1 2 3 4 5 6 7 8
8 0 0 0 0 0 0 0 0 91
9 0 0 0 0 0 0 0 0 0
duct epithelial cell 0 0 0 0 0 0 0 0 0
endothelial cell of hepatic sinusoid 0 0 0 0 0 0 0 0 0
hepatocyte 164 140 131 124 122 113 113 112 8
9 10 11 12 13 14 15 16 17
8 0 0 0 0 1 0 1 0 0
9 0 0 0 0 0 0 0 0 0
duct epithelial cell 0 0 0 0 0 0 0 0 0
endothelial cell of hepatic sinusoid 0 0 0 0 0 0 0 0 0
hepatocyte 96 96 86 78 76 72 56 43 41
18 19 20
8 0 0 0
9 28 0 0
duct epithelial cell 0 0 26
endothelial cell of hepatic sinusoid 0 27 0
hepatocyte 0 0 0
TSNEPlot(object = tiss, do.return = TRUE, group.by = "previous_annotation")
table(tiss@meta.data[, "previous_annotation"], tiss@ident)
0 1 2 3 4 5 6 7 8
8 0 0 0 0 0 0 0 0 91
9 0 0 0 0 0 0 0 0 0
duct epithelial cell 0 0 0 0 0 0 0 0 0
endothelial cell of hepatic sinusoid 0 0 0 0 0 0 0 0 0
hepatocyte 164 140 131 124 122 113 113 112 8
9 10 11 12 13 14 15 16 17
8 0 0 0 0 1 0 1 0 0
9 0 0 0 0 0 0 0 0 0
duct epithelial cell 0 0 0 0 0 0 0 0 0
endothelial cell of hepatic sinusoid 0 0 0 0 0 0 0 0 0
hepatocyte 96 96 86 78 76 72 56 43 41
18 19 20
8 0 0 0
9 28 0 0
duct epithelial cell 0 0 26
endothelial cell of hepatic sinusoid 0 27 0
hepatocyte 0 0 0
TSNEPlot(tiss, group.by="mouse.sex")
TSNEPlot(tiss, group.by="mouse.id")
Significant genes:
hepatocyte: Alb, Ttr, Apoa1, and Serpina1c pericentral: Cyp2e1, Glul, Oat, Gulo midlobular: Ass1, Hamp, Gstp1, Ubb periportal: Cyp2f2, Pck1, Hal, Cdh1
endothelial cells: Pecam1, Nrp1, Kdr+ and Oit3+ Kuppfer cells: Emr1, Clec4f, Cd68, Irf7 NK/NKT cells: Zap70, Il2rb, Nkg7, Cxcr6, Klr1c, Gzma B cells: Cd79a, Cd79b, Cd74 and Cd19 Immune cells: Ptprc
genes_hep = c('Alb', 'Ttr', 'Apoa1', 'Serpina1c',
'Cyp2e1', 'Glul', 'Oat', 'Gulo',
'Ass1', 'Hamp', 'Gstp1', 'Ubb',
'Cyp2f2', 'Pck1', 'Hal', 'Cdh1')
genes_endo = c('Pecam1', 'Nrp1', 'Kdr','Oit3')
genes_kuppfer = c('Emr1', 'Clec4f', 'Cd68', 'Irf7')
genes_nk = c('Zap70', 'Il2rb', 'Nkg7', 'Cxcr6', 'Gzma')
genes_b = c('Cd79a', 'Cd79b', 'Cd74')
genes_bec = c('Epcam', 'Krt19', 'Krt7')
genes_immune = 'Ptprc'
all_genes = c(genes_hep, genes_endo, genes_kuppfer, genes_nk, genes_b, genes_bec, genes_immune)
Dotplots let you see the intensity of exppression and the fraction of cells expressing for each of your genes of interest. The radius shows you the percent of cells in that cluster with at least one read sequenced from that gene. The color level indicates the average Z-score of gene expression for cells in that cluster, where the scaling is done over taken over all cells in the sample.
Using the markers above, we can confidentaly label many of the clusters:
19: endothelial cells 20: bile duct epithelial cells 21: immune cells rest are hepatocytes
We will add those cell_ontology_classes to the dataset.
tiss <- StashIdent(object = tiss, save.name = "cluster.ids")
cluster.ids <- c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21)
free_annotation <- c(
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"endothelial cells",
"bile duct epithelial cells",
"immune cells")
cell_ontology_class <- c(
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"hepatocyte",
"endothelial cell of hepatic sinusoid",
"duct epithelial cell",
"leukocyte")
tiss = stash_annotations(tiss, cluster.ids, free_annotation, cell_ontology_class)
The following `from` values were not present in `x`: 21
The following `from` values were not present in `x`: 21
The following `from` values were not present in `x`: 21
Color by metadata, like plate barcode, to check for batch effects.
TSNEPlot(object = tiss, do.return = TRUE, group.by = "channel")
Let’s drill down on the hepatocytes.
subtiss = SubsetData(tiss, ident.use = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))
subtiss <- subtiss %>% ScaleData() %>%
FindVariableGenes(do.plot = TRUE, x.high.cutoff = Inf, y.cutoff = 0.5) %>%
RunPCA(do.print = FALSE)
[1] "Scaling data matrix"
|
| | 0%
|
|=================================================================| 100%
PCHeatmap(object = subtiss, pc.use = 1:3, cells.use = 20, do.balanced = TRUE, label.columns = FALSE, num.genes = 8)
PCElbowPlot(subtiss)
sub.n.pcs = 8
sub.res.use = 0.5
subtiss <- subtiss %>% FindClusters(reduction.type = "pca", dims.use = 1:sub.n.pcs,
resolution = sub.res.use, print.output = 0, save.SNN = TRUE) %>%
RunTSNE(dims.use = 1:sub.n.pcs, seed.use = 10, perplexity=8)
TSNEPlot(object = subtiss, do.label = T, pt.size = .5, label.size = 4)
BuildClusterTree(subtiss)
[1] "Finished averaging RNA for cluster 0"
[1] "Finished averaging RNA for cluster 1"
[1] "Finished averaging RNA for cluster 2"
[1] "Finished averaging RNA for cluster 3"
[1] "Finished averaging RNA for cluster 4"
[1] "Finished averaging RNA for cluster 5"
[1] "Finished averaging RNA for cluster 6"
[1] "Finished averaging RNA for cluster 7"
[1] "Finished averaging RNA for cluster 8"
[1] "Finished averaging RNA for cluster 9"
An object of class seurat in project Liver
23341 genes across 1792 samples.
From these genes, it appears that the clusters represent:
0: midlobular male 1: pericentral female 2: periportal female 3: periportal male 4: midlobular male 5: pericentral male 6: midlobular female 7: midlobular female
The multitude of clusters of each type correspond mostly to individual animals/sexes.
table(FetchData(subtiss, c('mouse.sex','ident')) %>% droplevels())
ident
mouse.sex 0 1 2 3 4 5 6 7 8 9
F 4 292 282 0 6 2 132 113 0 20
M 315 0 0 204 171 150 0 0 93 8
sub.cluster.ids <- c(0, 1, 2, 3, 4, 5, 6, 7)
sub.free_annotation <- c("midlobular male", "pericentral female", "periportal female", "periportal male", "midlobular male", "pericentral male", "midlobular female", "midlobular female")
sub.cell_ontology_class <- c("hepatocyte", "hepatocyte", "hepatocyte", "hepatocyte", "hepatocyte", "hepatocyte", "hepatocyte", "hepatocyte")
subtiss = stash_annotations(subtiss, sub.cluster.ids, sub.free_annotation, sub.cell_ontology_class)
tiss = stash_subtiss_in_tiss(tiss, subtiss)
Color by metadata, like plate barcode, to check for batch effects.
TSNEPlot(object = subtiss, do.return = TRUE, group.by = "mouse.sex")
Color by cell ontology class on the original TSNE.
TSNEPlot(object = tiss, do.return = TRUE, group.by = "cell_ontology_class")
filename = here('00_data_ingest', '04_tissue_robj_generated',
paste0("droplet_", tissue_of_interest, "_seurat_tiss.Robj"))
print(filename)
[1] "/home/rstudio/tabula-muris/00_data_ingest/04_tissue_robj_generated/droplet_Liver_seurat_tiss.Robj"
save(subtiss, file=filename)
# To reload a saved object
#filename = here('00_data_ingest', '04_tissue_robj_generated',
# paste0("droplet_", tissue_of_interest, "_seurat_tiss.Robj"))
#load(file=filename)
save_annotation_csv(tiss, tissue_of_interest, "droplet")